This work presents a real-time, markerless virtual mouse system that replaces conventional hardware cursor control using hand gestures captured through a standard RGB webcam. The approach integrates the 21-point hand landmark model of MediaPipe Hands with a rule-based geometric gesture interpretation method to support cursor movement, left-click, right-click, and scrolling operations. Unlike learning-based gesture systems that require large training datasets and GPU-intensive processing, the proposed method employs lightweight geometric computations, enabling responsive performance on typical consumer hardware. The system architecture includes video capture, frame preprocessing, hand landmark extraction, finger articulation analysis, coordinate interpolation for screen mapping, motion smoothing, and gesture-to-event translation. Experimental evaluation shows gesture recognition accuracy above 92%, click reliability exceeding 90%, and stable performance above 25 frames per second under varying lighting conditions. These results demonstrate that the system offers an efficient and cost-effective solution for human–computer interaction, particularly in everyday computing, accessibility applications, and touchless interaction environments.
Introduction
The study proposes a deterministic, markerless virtual mouse system for touchless human–computer interaction using MediaPipe Hands and geometric analysis of 21 hand landmarks. Unlike traditional mice or deep-learning–based gesture systems, this approach operates in real time on standard RGB webcams without specialized hardware, GPU training, or large datasets. Hand gestures are interpreted using rule-based geometric methods, detecting finger states and joint angles to map four primary gestures—cursor movement, left click, right click, and scrolling—into mouse actions.
The system employs a modular pipeline: video acquisition, preprocessing, hand detection, landmark extraction, finger state estimation, gesture recognition, cursor mapping, and OS-level event execution. Stability is enhanced via a reduced interaction region, low-pass smoothing, and cooldown intervals, ensuring smooth, responsive cursor control and minimizing false triggers.
Evaluation results show high accuracy for gesture recognition (cursor ~95%, left click ~92%, right click ~90%, scroll ~88%) and stable performance under varying hand positions, rotations, and partial occlusions, with real-time operation at 25–40 FPS. Overall, the framework provides a simple, cost-effective, and reliable touchless input method suitable for sterile environments, AR/VR applications, and accessibility-focused computing.
Conclusion
The proposed real-time AI Virtual Mouse framework demonstrates the effectiveness of markerless hand gesture recognition as a viable alternative to conventional hardware-based pointing devices. By integrating the hand landmark detection capabilities of MediaPipe Hands with a deterministic gesture interpretation pipeline and an optimized cursor mapping strategy, the system successfully recognizes essential interaction gestures while delivering smooth and responsive pointer control suitable for everyday computing tasks.
The implementation of a reduced interaction workspace, combined with coordinate interpolation and low-pass motion filtering, plays a key role in improving cursor stability and minimizing jitter. These mechanisms allow users to perform both rapid movements and precise adjustments while maintaining controlled pointer behavior. Experimental observations indicate that the system maintains reliable operation across different lighting conditions, varied backgrounds, and natural differences in user hand structure, while sustaining consistent frame rates on standard consumer hardware.
User feedback further suggests that the gesture set is intuitive and can be learned quickly with minimal adaptation time. As a result, the system proved capable of supporting common computer operations such as cursor navigation, clicking interface elements, and scrolling through digital content.Overall, the proposed approach demonstrates that contactless gesture-based interfaces can effectively replicate the core functionality of traditional mouse input devices. The system’s robustness, computational efficiency, and practical usability provide a strong foundation for future developments in natural human–computer interaction and touchless control environments.
Looking ahead, the framework can be extended by incorporating additional gesture vocabularies, adaptive sensitivity mechanisms, and multi-hand interaction capabilities to support more complex interaction scenarios. Future improvements may also include integration with augmented or virtual reality systems, as well as the exploration of hybrid approaches that combine geometric reasoning with lightweight learning models. Such enhancements could further expand the applicability of gesture-driven interfaces in domains such as immersive computing, assistive technologies, and smart interactive environments
References
[1] F. Zhang, V. Bazarevsky, A. Vakunov, A. Tkachenka, G. Sung, C.-L. Chang, and M. Grundmann, “MediaPipe Hands: On-device real-time hand tracking,” arXiv preprint arXiv:2006.10214, 2020. arXiv
[2] F. Zhang, V. Bazarevsky, and M. Grundmann, “On-device, real-time hand tracking with MediaPipe,” Google Research Blog, Aug. 2019. [Online]. Available: mediapipe.dev Google Research
[3] M. Oudah, A. Al-Naji, and J. Chahl, “Hand gesture recognition based on computer vision: A review of techniques,” Journal of Imaging, vol. 6, no. 8, art. 73, 2020. MDPI
[4] H.-H. Huynh and D.-H. Vo, “Short survey on static hand gesture recognition,” Int. J. Adv. Comput. Sci. Appl. (IJACSA), vol. 8, no. 7, pp. 321–329, 2017. The Science and Information Organization
[5] R. Tripathi and B. Verma, “Survey on vision-based dynamic hand gesture recognition,” The Visual Computer, vol. 40, no. 9, pp. 6171–6199, 2024. Semantic Scholar+1
[6] N. Arora, A. Sharma, and S. Sharma, “Artificial intelligence based virtual mouse using hand gesture recognition,” GLIMPSE – Journal of Computer Science, vol. 2, no. 2, pp. 41–45, Jul.–Dec. 2023. AJAY KUMAR GARG ENGINEERING COLLEGE
[7] M. Dissanayake, M. Perera, C. Nanayakkara, and D. Dissanayake, “Improved virtual mouse pointer using Kalman filter based gesture tracking technique,” in Proc. Int. Research Symp. on Engineering Advancements (RSEA), Malabe, Sri Lanka, 2015, pp. 139–145. ResearchGate
[8] S. Park, “3D hand tracking using Kalman filter in depth space,” EURASIP Journal on Advances in Signal Processing, vol. 2012, art. 36, 2012. SpringerLink
[9] A. Goel, “A vision-based virtual mouse using hand gesture recognition,” TechRxiv preprint, 2021. TechRxiv
[10] S. N. Arul and S. Rangarajan, “AI virtual mouse using hand gesture recognition,” Int. Res. J. Adv. Eng. and Sci. (IRJAES), vol. 10, no. 5, pp. 112–118, 2025. irjaeh.com
[11] “AI Virtual Mouse using hand gestures,” GeeksforGeeks, 2023. [Online]. Available: geeksforgeeks.org/computer-vision/ai-virtual-mouse/ GeeksforGeeks
[12] R. Jain, S. Sharma, and S. Dubey, “Human–computer interaction: Hand gesture recognition using computer vision,” Asian Journal of Graduate Research, vol. 5, no. 1, pp. 45–54, 2022. journals.aijr.org
[13] Y. Meng, S. Li, and Y. Zhang, “Real-time hand gesture monitoring model based on MediaPipe and deep learning,” Sensors, vol. 24, no. 19, art. 6262, 2024. MDPI
[14] “MediaPipe Hand Landmarker: hand landmarks detection guide,” Google AI Edge, 2025. [Online]. Available:
ai.google.dev/edge/mediapipe/solutions/vision/hand_landmarker